SUPPLEMENTAL MATERIAL FOR: Implications of Functional Similarity for Gene Regulatory Interactions
نویسندگان
چکیده
Many other functional classification schemes have been proposed besides the Gene Ontology [20][19][18][3][9]. Here we choose two to serve as a contrasting comparison to the Gene Ontology database: (1) the E. coli genome and proteome (GenProtEC) database [18], and (2) the Clusters of Orthologous Groups of proteins (COG) database [20]. We chose GenProtEC since we are focusing our investigation on E. coli and this database is specifically dedicated to the functions performed by this organism. We chose COG as an example of a database which records gene properties in a manner which should be, on the whole, distinctly different from the Gene Ontology. We downloaded annotation information from these databases’ corresponding websites [1][2] and used it to construct two bipartite graphs, one for each database, in the same manner as with E. coli annotations in the Gene Ontology (see main text, Section 1.2.1). These two databases are, in general, smaller than the Gene Ontology. GenProtEC contains 23137 annotations from 3361 genes to 594 functional categories (a density of just over 1%). COG assigns each gene to one orthologous group (although some orthologous groups contain several genes). This results in 3450 annotations from 3450 genes to 2131 orthologous groups (a density of only 0.05%). This is in contrast to the Gene Ontology with a total of 119936 gene-term annotations between 3794 E. coli genes and 3882 functional categories (a density of 0.8%). Figure S1 shows the degree distribution for “terms” and genes in the Gene Ontology as well as these two databases. In all three, the degree distribution of the functional categories (or orthologous groups in COG) is heavy-tailed. This reinforces our belief that taking into account the degree of a functional category is important when designing a measure to accurate reflect the functional similarity between two genes. COG’s construction implies that every gene has the same degree, however, is it interesting that the degree distribution of genes both in the Gene Ontology and GenProtEC have the same basic behavior. We used the bipartite graphs we constructed from these two databases to calculate a scaled similarity and Kappa statistic between pairs of genes. The smaller database size and sparse construction of COG are evident in the results. Using annotations from the GenProtEC database we calculated a scaled similarity and Kappa statistic for 3086481 out of a possible 5646480 gene-pairs (55%) but using annotations from COG we could only calculate a scaled similarity and Kappa statistic for 2848 of a possible 5949525 genepairs (0.048%). This is, again, in contrast to the Gene Ontology where we can estimate a scaled similarity and Kappa statistic for 6713626 of a possible 7195321 gene-pairs (93%). By default, any gene-pair without a score is given a default value of zero. We calculated the maximum F-score for both the scaled similarity and Kappa statistic in each of these databases using RegulonDB as our gold standard (see the main text, Section 2.2, for more information on how we calculated the F-Score). Because of the different percentage of edges each database can assign a score, the absolute value of the F-score varies quite broadly. For example, using annotations from COG we can estimate a functional score for fewer edges than actually appear in our gold standard (and only a subset of these extend from a transcription factor). As a result, the maximum F-score in this database occurs when only a very few edges (18 and 147 for the scaled similarity and Kappa statistic, respectively), are used to define the “true positive” and “false negative” classes.
منابع مشابه
Implications of functional similarity for gene regulatory interactions.
If one gene regulates another, those two genes are likely to be involved in many of the same biological functions. Conversely, shared biological function may be suggestive of the existence and nature of a regulatory interaction. With this in mind, we develop a measure of functional similarity between genes based on annotations made to the Gene Ontology in which the magnitude of their functional...
متن کاملFactor VII Gene Defects: Review of Functional Studies and Their Clinical Implications
Coagulation factors belong to a family of plasma glycosylated proteins that should be activated for appropriate blood coagulation. Congenital deficiencies of these factors cause inheritable hemorrhagic diseases. Factor VII (FVII) deficiency is a rare bleeding disorder with variable clinical symptoms. Various mutations have been identified throughout the F7 gene and can affect all the protein do...
متن کاملPhysiological and Pathological Roles for MicroRNAs: Implications for Immunity Complications
MicroRNAs (miRNAs) are small non-coding regulatory RNAs molecules with a size of approximately 22 nucleotides that are implicated in regulating gene expression at the post-transcriptional regulatory levels. Inflammatory disorders especially autoimmune diseases (ADs) occur from an abnormal immune response of body against cells of their own specific tissues or multiple organ systems leading to ch...
متن کاملTrophic interactions among three elasmobranch species coexisting in the Oman Sea: implications for resource partitioning
In this study, trophic interactions were studied among three elasmobranches including Iago omanensis, Rhinobatos punctifer and Torpedo sinuspersici coexisting in the depth ranging from 50 to 100 meters in the Oman Sea (Sistan and Balochistan waters). A total of 238 specimens were collected using industrial bottom trawlers of ‘FERDOWS’ during spring and summer,2014 and their stomach contents wer...
متن کاملCTLA4 Gene Variants in Autoimmunity and Cancer: a Comparative Review
Gene association studies are less appealing in cancer compared to autoimmune diseases. Complexity, heterogeneity, variation in histological types, age at onset, short survival, and acute versus chronic conditions are cancer related factors which are different from an organ specific autoimmune disease, such as Grave’s disease, on which a large body of multicentre data is accumulated. For years t...
متن کاملAssessing semantic similarity measures for the characterization of human regulatory pathways
MOTIVATION Pathway modeling requires the integration of multiple data including prior knowledge. In this study, we quantitatively assess the application of Gene Ontology (GO)-derived similarity measures for the characterization of direct and indirect interactions within human regulatory pathways. The characterization would help the integration of prior pathway knowledge for the modeling. RESU...
متن کامل